Predict Bike Sharing Demand with AutoGluon Template

Project: Predict Bike Sharing Demand with AutoGluon

This notebook is a template with each step that you need to complete for the project.

Please fill in your code where there are explicit ? markers in the notebook. You are welcome to add more cells and code as you see fit.

Once you have completed all the code implementations, please export your notebook as a HTML file so the reviews can view your code. Make sure you have all outputs correctly outputted.

File-> Export Notebook As... -> Export Notebook as HTML

There is a writeup to complete as well after all code implememtation is done. Please answer all questions and attach the necessary tables and charts. You can complete the writeup in either markdown or PDF.

Completing the code template and writeup template will cover all of the rubric points for this project.

The rubric contains "Stand Out Suggestions" for enhancing the project beyond the minimum requirements. The stand out suggestions are optional. If you decide to pursue the "stand out suggestions", you can include the code in this notebook and also discuss the results in the writeup file.

Step 1: Create an account with Kaggle

Create Kaggle Account and download API key

Below is example of steps to get the API username and key. Each student will have their own username and key.

  1. Open account settings.

  2. Scroll down to API and click Create New API Token.

  3. Open up kaggle.json and use the username and key.

Step 2: Download the Kaggle dataset using the kaggle python library

Open up Sagemaker Studio and use starter template

  1. Notebook should be using a ml.t3.medium instance (2 vCPU + 4 GiB)
  2. Notebook should be using kernal: Python 3 (MXNet 1.8 Python 3.7 CPU Optimized)

Install packages

Setup Kaggle API Key

Download and explore dataset

Go to the bike sharing demand competition and agree to the terms

Step 3: Train a model using AutoGluon’s Tabular Prediction

Requirements:

Review AutoGluon's training run with ranking of models that did the best.

Create predictions from test dataset

NOTE: Kaggle will reject the submission if we don't set everything to be > 0.

Set predictions to submission dataframe, save, and submit

View submission via the command line or in the web browser under the competition's page - My Submissions

Initial score of 1.39920

Step 4: Exploratory Data Analysis and Creating an additional feature

Make category types for these so models know they are not just numbers

Step 5: Rerun the model with the same settings as before, just with more features

New Score of 0.47165

Step 6: Hyper parameter optimization

New Score of .50893

Step 7: Write a Report

Refer to the markdown file for the full report

Creating plots and table for report

Hyperparameter table

Plot time series of train and test data

Prediction with XGBoost